[receiver/awslambda] Add multi-format S3 log routing#47237
[receiver/awslambda] Add multi-format S3 log routing#47237MichaelKatsoulis wants to merge 14 commits intoopen-telemetry:mainfrom
Conversation
|
@axw This is the first implementation of the multi-format log support in awslambda receiver. One thing that I am not sure about is the |
receiver/awslambdareceiver/match.go
Outdated
| patternParts := strings.Split(pattern, "/") | ||
| targetParts := strings.Split(target, "/") |
There was a problem hiding this comment.
We'll be doing a lot of pattern matching, should we optimise this?
- The pattern can be split at receiver construction time, so we only create a slice once per pattern
- The object key can be split once per S3 event, rather than once per encoding pattern
receiver/awslambdareceiver/match.go
Outdated
| } | ||
| if strings.Contains(p, "*") { | ||
| // Glob matching within a segment (e.g. "eni-*", "*_CloudTrail_*"). | ||
| matched, _ := path.Match(p, targetParts[i]) |
There was a problem hiding this comment.
This will also handle "?" and character classes etc. Do we want that?
If yes, we should document it. If no, we should probably create a variation of path.Match that only handles *. I think we'd be better of simplifying.
receiver/awslambdareceiver/match.go
Outdated
| // matchPrefixWithWildcard("eni-0abc123-all", "eni-*") => true | ||
| // matchPrefixWithWildcard("123_CloudTrail_us-east-1", "*_CloudTrail_*") => true | ||
| // matchPrefixWithWildcard("any/path", "*") => true | ||
| func matchPrefixWithWildcard(target, pattern string) bool { |
There was a problem hiding this comment.
Would we be better off supporting only a single * or an exact match for S3 path parts? It seems unnecessary to support glob matching in the middle of a path part.
That would simplify the matching -- no need for path.Match, just an exact match or accept anything. I think that makes things easier to reason about. We could also optimise the code further by using a trie (prefix tree), but that's probably overkill given how much time will be spend in matching vs. I/O and parsing.
If we did this, then I think we'll end up with two flavours of patterns:
- For S3 object keys: path matching with either exact path part match or full wildcard part match.
- For CloudWatch log group and stream names: prefix and/or suffix wildcard only (
*_foo_*, but notfoo*bar)
WDYT?
There was a problem hiding this comment.
I agree at all the simplification suggestions.
I was initially thinking, "why not support that too?", but that's seems to be an overkill and complicates things.
…m:MichaelKatsoulis/opentelemetry-collector-contrib into feature/awslambdareceiver-s3-multi-format
|
@axw after doing these two main things:
we can see clear performance gains. Clear difference:
|
Description
This PR adds support for routing S3 objects to different encoding extensions based on their key prefix within a
single Lambda deployment.
This is useful when a Lambda receives events from S3 buckets that
store multiple log types (e.g. VPC Flow Logs and CloudTrail in the same bucket, or across
multiple buckets with different log types).
We introduce a new
encodingsfield in the S3 receiver config that can be used like this:The existing
encodingfield is unchanged.encodingandencodingsare mutuallyexclusive.
Link to tracking issue
Part of #46458
Testing
Unit Testing.
Pending E2E testing.
Documentation
Readme has been updated.